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(54) Telephone speech recognition 



(57) A telephone speech recognition system is provided which has a high level of speech recognition 
regardless of various conditions of a telephone line. 

The system comprises speech analyzers 4 and 5, and reference speech model storages 7 to 9 
corresponding to line connection data. A telephone line interface 1 having a line connection data acquisition 
function analyzes a call received from the telephone line for identifying the country, route, and other 
information of the call and transmits those line connection data to a line connection data processor 2. The line 
connection data processor 2 selects one of the acoustic analyzers 4 and 5 in response to the line connection 
data from the interface 1 and also one of the speech model storages 7 to 9. A speech pattern matcher 11 
compares an acoustic vector train output of the selected acoustic analyzer with the speech models given from 
the selected reference speech model storage for speech recognition. 
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. 1 . 2316575 

TELEPHONE SPEECH RECOGNITION SYSTEM 

The present invention relates to a telephone speech recognition 
system and particularly, a telephone speech recognition system for use 
with a telephone line, network, or switchboard. 

A conventional speech recognition system connected to 
a telephone line is now explained referring to Fig. 3. When the speech 
of a caller is received from a telephone line and has to be identified, it is 
transmitted via a telephone line interface 31 to the speech recognition 
system. The conventional speech recognition system substantially 
comprises an acoustic analyzer 32, a speech pattern matcher 33, and 
a reference speech model storage 34. 

The speech of the caller introduced from the telephone line 
interface 31 is first fed to the acoustic analyzer 32. In the acoustic 
analyzer 32, the speech is divided at equal intervals of substantially 10 
milliseconds (ms) on the basis of a Hamming window of about 25 ms and 
subjected to acoustic analysis such as cepstrum analysis to produce an 
acoustic vector train which is then supplied to the speech pattern matcher 
33. The reference speech model storage 34 saves speech models such 
as HMMs (hidden Markov models). The speech pattern matcher 33 
collates the acoustic vector train with the speech models saved in the 
reference speech model storage 34. Consequently, a succession of 
symbols representing the speech models at highest likelihood are 
released as outputs of the speech recognition. 

In the conventional manner, the speech recognition is however 
carried out regardless of the conditions of line connection from a caller at 
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the other end of the line to the telephone line interface 31 at the entrance 
of the speech recognition system. More specifically, the collating action 
of the speech pattern matcher 33 with the speech models saved in the 
reference speech model storage 34 is executed without concerning any of 
the conditions of line connection. Accordingly, when particular noises 
from the telephone line are involved or there is a difference in the 
frequency characteristic between the conventional telephone line and a 
line used with a system such as a mobile telephone set, a desired level of 
speech recognition will hardly be accomplished. 

It is particularly difficult for such a conventional speech 
recognition system to recognize voice data in a call received from the 
international telephone line which varies depending on terminal and line 
systems of a country. 

It is therefore desirable to provide a telephone speech 
recognition system capable of performing a high level of speech 
recognition Without being affected by various renditions of a telephone 
line while eliminating the foregoing drawbacks of the prior art. 

According to the present invention there is provided 

a telephone speech recognition system for recognizing speech data 
received from a telephone line comprising: a telephone line interface 
connected to the telephone line for detecting line connection data; 
a plurality of acoustic analyzers having means for removing noise derived 
from the line characteristics and/or the route characteristics; and a line 
connection data processor responsive to the line connection data from the 
telephone line interface for selecting the acoustic analyzers. In addition, 
a plurality of reference speech model storages for saving speech models 



corresponding to the line characteristics and/or the route characteristics 
may be provided. 

a telephone speech recognition system embodying the present 

invention may allow the telephone line connection data to be used far 
selecting a desired acoustic analyzer and a pertinent speech model, thus 
enhancing the quality of speech recognition. 

Bobodiments of the invention win now be described by way 
of example with reference to the drawings, in which: 

Fig. 1 is a block diagram showing a first embodiment of the 
present invention; 

Fig. 2 is a block diagram showing a second embodiment of the 
present invention; and 

Fig. 3 is a block diagram showing a conventional speech 
recognition system. 

Embodiments of the invention will be described in detail with 
reference to the accompanying drawings. 



Fig. 1 is a block diagram showing one 
embodiment of the present invention. 

As shown, a line interface 1 having a known function of line 
connection data acquisition is connected to a telephone line, a network or 
a switching board. The telephone line interface 1 with the line 
connection data acquisition function examines line connection data of 
a received call including the telephone number of a caller, 
the interconnection in a private branch exchange (PBX), and (when the 
call is an international call,) the country of the caller and the route of 
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transmission (e.g. via satellite links or underwater cable links). The line 
connection data a extracted by the telephone line interface 1 with line 
connection data acquisition function is transmitted to a line connection 
data processor 2 while a speech data b of the call is given to a first 
switching unit 3. 

In response to the line connection data a from the telephone line 
interface 1 with line connection data acquisition function, the line 
connection data processor 2 actuates the first switching unit 3 and 
a second switching unit 6 to select either a first 4 or a second acoustic 
analyzer 5. Each of the first 4 and the second acoustic analyzer 5 
separates the speech data at equal intervals of substantially 10 ms on 
the basis of a Hamming window of about 25 ms and subjects its data 
segments to acoustic analysis such as cepstrum analysis to produce 
a train of acoustic vectors. 

It is now noted that the speech may be free from or contain 
a noise in a particular frequency range of voice signal depending on 
the route of transmission or the country of the caller, for example, any call 
from a specific nation in Europe carries such a noise. For handling the 
former and the latter, the first 4 and the second acoustic analyzer 5 
2 o respectively are connected in parallel for selective use. 

The first acoustic analyzer 4 analyzes the speech which contains 
no such noise. The second acoustic analyzer 5 has a notch filter or 
the like for removing the noise to produce a acoustic vector train from the 
noise-free speech. The embodiment is not limited to the two acoustic 
analyzers 4 and 5 shown and three or more acoustic analyzers may be 
used when three or more noise-imposed speech data are received. 
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There are provided three, first, second, and third, reference 
speech model storages 7 to 9 for saving speech models, e.g. HMMs, 
defined according to the countries of callers and the routes of 
transmission. A third switching unit 10 is responsive to a control signal 
5 from the line connection data processor 2 for selectively connecting one of 
the three reference speech model storages 7 to 9 to the speech pattern 
matcher! 1. The speech pattern matcher 1 1 then compares a speech 
model d from the selected reference speech model storage with the 
acoustic vector train c transmitted through the second switching unit 6 for 

i o speech recognition and delivers its result. 

In the embodiment, the most appropriate acoustic analyzer and 
speech model storage can be selected corresponding to the line connec- 
tion data including the country of a caller and/or the route of transmission. 
This allows the speech recognition to conform to the characteristics of the 

i 5 telephone line and the noise, hence enhancing the performance of 

the speech recognition system. The speech models saved in the first to 
third speech model storages 7 to 9 may be increased in quality through 
learning. 

A second embodiment of the present invention will now be 
described referring to Fig. 2. The second embodiment is substantially 
identical to the first embodiment but characterized by another set of first to 
third reference speech model storages 12 to 14 which are then explained 
in more detail. 

In particular, the first 12 and the second speech model storage 
13 save noise models. A speech may include a silence pause where no 
sound is made. It is essential for the speech recognition to correctly 
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discriminate the silence pause from the other speech period. However, 
an intrinsic noise is sometimes imposed on the silence pause when the 
call travels through a particular route. For identification of such intrinsic 
noises, their models are saved in the first 12 and the second speech 
model storage 13. The switching unit 10 is activated by a control signal 
from the line connection data processor 2 to selectively connect either the 
first 12 or the second speech model storage 13 to the speech pattern 
matcher 11. The third speech model storage 14 saves speech models 
for use regardless of the country of a caller and/or the route of 
transmission. More specifically, the speech models saved in the third 
speech model storage 14 are identical to those, e.g. HMMs, saved in a 
speech model storage 34 shown in Fig. 3. The speech pattern matcher 
1 1 identifies the silence pause in a speech from the noise models 
supplied through the third switching unit 10 and collates the speech with 
the speech model from the third speech model storage 14 to recognize 
voice sounds in the speech. A result of the speech recognition is then 
delivered as an output 

The first 12 and the second speech model storage 13 can 
improve the noise models through learning. In practice, the speech data 
received from a telephone line is saved in memory means and when the 
telephone line is disconnected, is fed to the first 4 or the second acoustic 
analyzer 5 for extracting its noise data which is then saved. In this 
manner of learning, the noise models saved in the first 12 and the second 
speech model storage 13 can be improved. 

While the speech model storage 14 saves the speech models for 
use regardless of the country of a caller and/or the route of transmission 



in the previous embodiment and is of no limitation, a group of reference 
speech model storages may be provided for saving various speech 
models corresponding to the countries of callers and the routes so that 
one of them can be selected by a control signal from the line connection 
data processor 2. 

The action of the second embodiment is explained. When a 
call is received from the international telephone line, the telephone line 
interface 1 identifies the calling country or city and/or the route of 
transmission from the telephone number of a caller. A resultant line 
connection data a is transmitted to the line connection data processor 2. 
Simultaneously, the telephone line interface 1 delivers a speech data b to 
the first switching unit 3. In response to the line connection data 
a received, the line connection data processor 2 sends selection control 
signals to the first 3, the second 6, and the third switching unit 10. For 
example, when the calling country or city is other than the specific ones, 
the first acoustic analyzer 4 and the first reference speech model storage 
12 are selected by the control signals. 

The speech data from the telephone line interface 1 is thus fed 
to its pertinent, first acoustic analyzer 4 where it is acoustically analyzed 
so that while silence segments in the speech data are detected by 
referring to the noise models saved in the first reference speech model 
storage 12, speech recognition is made from the speech models saved in 
the reference speech model storage 14. 

If the call is received from the specific country or via the 
particular route, the second acoustic analyzer 5 and the second speech 
model storage 13 are selected. As the result, data of the call received 
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from the specific country or via the particular route can be subjected to 
the speech recognition with the most appropriate sections of the acoustic 
analyzer and the speech model, whereby the quality of the speech 
recognition of the data will be increased. 

As set forth above, the embodiment allows the acoustic analyzer 
and the reference speech model storage to be selected according to 
the country of a caller and/or the route of transmission, thus enhancing 
the quality of speech recognition. When a not-continuous insertion signal 
is contained in a call from the public booth in a specific country, it will be 
eliminated by selecting and using desired one of the acoustic analyzers. 
In a domestic use, a call from a mobile telephone is identified by its 
key-station number and can be subjected to the speech recognition with 
the use of speech models for mobile telephone, thus having a higher level 
of speech recognition. 

1 5 Bibodiinents of the present invention allow the line connection data 

in a call, including the telephone number of a caller and/or the route 
of transmission, to be used for selecting the acoustic analyzer and the 
speech models. Accordingly, the speech recognition is performed in 
reference with the line characteristics and the noise characteristics and 

2 o will be increased in the quality. 
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CLAIM 



1 . A telephone speech recognition system for recognizing 
speech data received from a telephone line comprising: 

a telephone line interface connected to the telephone line for 
detecting line connection data; 
5 a plurality of acoustic analyzers having means for removing 

noise derived from at least one of the line characteristics and/or the route 
characteristics; and 

a line connection data processor for selecting the acoustic 
analyzers in response to the line connection data from the telephone line 
i o interface. 

2. A telephone speech recognition system according to claim 1, 
further comprising: 

a plurality of reference speech model storages for saving speech 
nnMs uxjfcsjmling to at least cne c£ tte lire r faBrt«^cHW ^ ^ 

characteristics, in which a selection signal from the line connection data 
processor is used for selecting the reference speech model storages. 

3. A telephone speech recognition system according to claim 2, 
wherein the reference speech model storages save speech models 
attributed to the telephone line. 

4. A telephone speech recognition system according to claim 2, 
wherein the reference speech model storages save noise models 
appearing in silence segments in dialogue so that any silence segment in 
a dialogue can be recognized by a speech pattern matcher referring to the 
noise models. 



1 5 



2 o 



BNSDOCID: <GB 2316S7SA_J_> 



A 



- 1 0 - 

5. A telephone speech recognition system according to claim 3 
or 4, wherein the speech models saved in the reference speech model 
storages are updated by learning. 

6. Telephone speech recognition system as hereinbefore 
described with reference to Figures 1 to 3 of the accompanying drawings. 
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